Data Analysis¶

Task: Survivor Analysis EDA¶

In [1]:
# standard imports
import pandas as pd
import numpy as np

# Do not change this option; This allows the CodeGrade auto grading to function correctly
pd.set_option('display.max_columns', None)

First, import the data from the survivor.xlsx file, calling the respective DataFrames the same as the sheet name but with lowercase and snake case. For example, the sheet called Castaway Details should be saved as a DataFrame called castaway_details. Make sure that the data files are in the same folder as your notebook.

Note: You may or may not need to install openpyxl for the code below to work. You can use: $ pip install openpyxl

In [2]:
#pip install openpyxl
In [3]:
# import data from Excel

# setup Filename and Object
fileName = "survivor.xlsx"
xls = pd.ExcelFile(fileName)

# import individual sheets /// next will use copies of each df
castaway_details1 = pd.read_excel(xls, 'Castaway Details')
castaways1 = pd.read_excel(xls, 'Castaways')
challenge_description1 = pd.read_excel(xls, 'Challenge Description')
challenge_results1 = pd.read_excel(xls, 'Challenge Results')
confessionals1 = pd.read_excel(xls, 'Confessionals')
hidden_idols1 = pd.read_excel(xls, 'Hidden Idols')
jury_votes1 = pd.read_excel(xls, 'Jury Votes')
tribe_mapping1 = pd.read_excel(xls, 'Tribe Mapping')
viewers1 = pd.read_excel(xls, 'Viewers')
vote_history1 = pd.read_excel(xls, 'Vote History')
season_summary1 = pd.read_excel(xls, 'Season Summary')
season_palettes1 = pd.read_excel(xls, 'Season Palettes')
tribe_colours1 = pd.read_excel(xls, 'Tribe Colours')
In [4]:
#Copies of the original dataframes
castaway_details = castaway_details1.copy()
castaways = castaways1.copy()
challenge_description = challenge_description1.copy()
challenge_results = challenge_results1.copy()
confessionals = confessionals1.copy()
hidden_idols = hidden_idols1.copy() 
jury_votes = jury_votes1.copy() 
tribe_mapping = tribe_mapping1.copy()
viewers = viewers1.copy()
vote_history = vote_history1.copy()
season_summary = season_summary1.copy() 
season_palettes = season_palettes1.copy() 
tribe_colours = tribe_colours1.copy() 
In [5]:
## Checking for the shape of each dataframes

print('castaway_details', castaway_details.shape)
print('castaways', castaways.shape)
print('challenge_description', challenge_description.shape) 
print('challenge_results', challenge_results.shape) 
print('confessionals', confessionals.shape)
print('hidden_idols', hidden_idols.shape)
print('jury_votes',jury_votes.shape)  
print('tribe_mapping', tribe_mapping.shape) 
print('viewers', viewers.shape) 
print('vote_history', vote_history.shape)
print('season_summary', season_summary.shape)
print('season_palettes', season_palettes.shape)
print('tribe_colours', tribe_colours.shape)
castaway_details (608, 10)
castaways (762, 20)
challenge_description (892, 14)
challenge_results (4441, 13)
confessionals (6684, 6)
hidden_idols (159, 10)
jury_votes (933, 7)
tribe_mapping (7322, 8)
viewers (610, 9)
vote_history (4751, 15)
season_summary (41, 20)
season_palettes (205, 3)
tribe_colours (145, 5)

Exercise1: Change every column name of every DataFrame to lowercase and snake case. This is a standard first step for some programmers as lowercase makes it easier to write and snake case makes it easier to copy multiple-word column names.

For example, Castaway Id should end up being castaway_id. You should try doing this using a for loop instead of manually changing the names for each column. It should take you no more than a few lines of code. Use stackoverflow if you need help.

In [6]:
for columns in castaway_details.columns:
    castaway_details.columns =  castaway_details.columns.str.lower()
    castaway_details.columns = castaway_details.columns.str.replace(' ','_') 
    
castaway_details.head()  
Out[6]:
castaway_id full_name short_name date_of_birth date_of_death gender race ethnicity occupation personality_type
0 1 Sonja Christopher Sonja 1937-01-28 NaT Female NaN NaN Musician ENFP
1 2 B.B. Anderson B.B. 1936-01-18 2013-10-29 Male NaN NaN Real Estate Developer ESTJ
2 3 Stacey Stillman Stacey 1972-08-11 NaT Female NaN NaN Attorney ENTJ
3 4 Ramona Gray Ramona 1971-01-20 NaT Female Black NaN Biochemist/Chemist ISTJ
4 5 Dirk Been Dirk 1976-06-15 NaT Male NaN NaN Dairy Farmer ISFP
In [7]:
for columns in castaways.columns:
    castaways.columns =  castaways.columns.str.lower()
    castaways.columns = castaways.columns.str.replace(' ','_') 
    
castaways.head()  
Out[7]:
season_name season full_name castaway_id castaway age city state personality_type episode day order result jury_status original_tribe swapped_tribe swapped_tribe_2 merged_tribe total_votes_received immunity_idols_won
0 Survivor: 41 41 Erika Casupanan 594 Erika 32 Toronta Ontario ENFP 13 26 18 Sole Survivor NaN Luvu NaN NaN Via Kana 2 8
1 Survivor: 41 41 Deshawn Radden 601 Deshawn 26 Miami Florida ENTP 13 26 17 Runner-up NaN Luvu NaN NaN Via Kana 7 6
2 Survivor: 41 41 Xander Hastings 597 Xander 20 Chicago Illinois INFJ 13 26 16 2nd runner-up NaN Yase NaN NaN Via Kana 2 6
3 Survivor: 41 41 Heather Aldret 593 Heather 52 Charleston South Carolina ISFJ 13 25 15 15th voted out 8th jury member Luvu NaN NaN Via Kana 4 6
4 Survivor: 41 41 Ricard Foye 596 Ricard 31 Sedro-Woolley Washington ENTJ 13 24 14 14th voted out 7th jury member Ua NaN NaN Via Kana 9 5
In [8]:
for columns in challenge_description.columns:
    challenge_description.columns =  challenge_description.columns.str.lower()
    challenge_description.columns = challenge_description.columns.str.replace(' ','_') 
    
challenge_description.head() 
Out[8]:
challenge_id challenge_name puzzle race precision endurance strength turn_based balance food knowledge memory fire water
0 CH0001 Quest for Fire False True False False False False False False False False True True
1 CH0002 Bridging the Gap False True False False False False False False False False True True
2 CH0003 Trail Blazer False True False False False False False False False False True False
3 CH0004 Buggin' Out False False False False False False False True False False False False
4 CH0005 Tucker'd Out False True True False False False False True False False False False
In [9]:
for columns in challenge_results.columns:
    challenge_results.columns =  challenge_results.columns.str.lower()
    challenge_results.columns = challenge_results.columns.str.replace(' ','_') 
    
challenge_results.head() 
Out[9]:
season_name season episode day episode_title challenge_name challenge_type outcome_type challenge_id winner_id winner winning_tribe outcome_status
0 Survivor: Borneo 1 1 3 The Marooning Quest for Fire Reward and Immunity Tribal CH0001 2.0 B.B. Pagong Winner
1 Survivor: Borneo 1 1 3 The Marooning Quest for Fire Reward and Immunity Tribal CH0001 4.0 Ramona Pagong Winner
2 Survivor: Borneo 1 1 3 The Marooning Quest for Fire Reward and Immunity Tribal CH0001 6.0 Joel Pagong Winner
3 Survivor: Borneo 1 1 3 The Marooning Quest for Fire Reward and Immunity Tribal CH0001 7.0 Gretchen Pagong Winner
4 Survivor: Borneo 1 1 3 The Marooning Quest for Fire Reward and Immunity Tribal CH0001 8.0 Greg Pagong Winner
In [10]:
for columns in confessionals.columns:
    confessionals.columns =  confessionals.columns.str.lower()
    confessionals.columns = confessionals.columns.str.replace(' ','_') 
    
confessionals.head() 
Out[10]:
season_name season episode castaway castaway_id confessional_count
0 Survivor: 41 41 1 JD 603 11
1 Survivor: 41 41 1 Evvie 598 9
2 Survivor: 41 41 1 Danny 599 8
3 Survivor: 41 41 1 Xander 597 5
4 Survivor: 41 41 1 Deshawn 601 4
In [11]:
for columns in hidden_idols.columns:
    hidden_idols.columns =  hidden_idols.columns.str.lower()
    hidden_idols.columns = hidden_idols.columns.str.replace(' ','_') 
    
hidden_idols.head() 
Out[11]:
season_name season castaway_id castaway idol_number idols_held votes_nullified day_found day_played legacy_advantage
0 Survivor: Guatemala 11 161 Gary 1 1 0.0 24.0 NaN False
1 Survivor: Panama 12 180 Terry 1 1 0.0 NaN NaN False
2 Survivor: Cook Islands 13 202 Yul 1 1 0.0 NaN NaN False
3 Survivor: Fiji 14 218 Yau-Man 1 1 4.0 17.0 36.0 False
4 Survivor: Fiji 14 214 Mookie 1 1 0.0 20.0 NaN False
In [12]:
for columns in jury_votes.columns:
    jury_votes.columns =  jury_votes.columns.str.lower()
    jury_votes.columns = jury_votes.columns.str.replace(' ','_') 
    
jury_votes.head() 
Out[12]:
season_name season castaway finalist vote castaway_id finalist_id
0 Survivor: 41 41 Heather Deshawn 0 593 601
1 Survivor: 41 41 Ricard Deshawn 0 596 601
2 Survivor: 41 41 Danny Deshawn 1 599 601
3 Survivor: 41 41 Liana Deshawn 0 608 601
4 Survivor: 41 41 Shan Deshawn 0 606 601
In [13]:
for columns in tribe_mapping.columns:
    tribe_mapping.columns =  tribe_mapping.columns.str.lower()
    tribe_mapping.columns = tribe_mapping.columns.str.replace(' ','_') 
    
tribe_mapping.head() 
Out[13]:
season_name season episode day castaway_id castaway tribe tribe_status
0 Survivor: Borneo 1 1 3 2.0 B.B. Pagong Original
1 Survivor: Borneo 1 1 3 4.0 Ramona Pagong Original
2 Survivor: Borneo 1 1 3 6.0 Joel Pagong Original
3 Survivor: Borneo 1 1 3 7.0 Gretchen Pagong Original
4 Survivor: Borneo 1 1 3 8.0 Greg Pagong Original
In [14]:
for columns in viewers.columns:
    viewers.columns =  viewers.columns.str.lower()
    viewers.columns = viewers.columns.str.replace(' ','_') 
    
viewers.head() 
Out[14]:
season_name season episode_number_overall episode episode_title episode_date viewers rating_18_49 share_18_49
0 Survivor: 41 41 597.0 1 A New Era 2021-09-22 6.25 1.1 8.0
1 Survivor: 41 41 598.0 2 Juggling Chainsaws 2021-09-29 5.90 1.0 7.0
2 Survivor: 41 41 599.0 3 My Million Dollar Mistake 2021-10-06 5.79 0.9 6.0
3 Survivor: 41 41 600.0 4 They Hate Me Because They Ain't Me 2021-10-13 5.68 0.9 7.0
4 Survivor: 41 41 601.0 5 The Strategist or the Loyalist 2021-10-20 5.62 1.0 7.0
In [15]:
for columns in vote_history.columns:
    vote_history.columns =  vote_history.columns.str.lower()
    vote_history.columns = vote_history.columns.str.replace(' ','_') 
    
vote_history.head() 
Out[15]:
season_name season episode day tribe_status castaway immunity vote nullified voted_out order vote_order castaway_id vote_id voted_out_id
0 Survivor: 41 41 1 3 Original Evvie NaN Abraham False Abraham 1 1 598 591.0 591.0
1 Survivor: 41 41 1 3 Original Liana NaN Abraham False Abraham 1 1 608 591.0 591.0
2 Survivor: 41 41 1 3 Original Tiffany NaN Abraham False Abraham 1 1 604 591.0 591.0
3 Survivor: 41 41 1 3 Original Voce NaN Abraham False Abraham 1 1 607 591.0 591.0
4 Survivor: 41 41 1 3 Original Xander NaN Abraham False Abraham 1 1 597 591.0 591.0
In [16]:
for columns in season_summary.columns:
    season_summary.columns =  season_summary.columns.str.lower()
    season_summary.columns = season_summary.columns.str.replace(' ','_') 
    
season_summary.head() 
Out[16]:
season_name season location country tribe_setup full_name winner_id winner runner_ups final_vote timeslot premiered ended filming_started filming_ended viewers_premier viewers_finale viewers_reunion viewers_mean rank
0 Survivor: Borneo 1 Pulau Tiga, Sabah, Malaysia Malaysia Two tribes of eight new players Richard Hatch 16 Richard Kelly Wiglesworth 4-3 Wednesday 8:00 pm 2000-05-31 2000-08-23 2000-03-13 2000-04-20 15.51 51.69 36.70 28.30 2.0
1 Survivor: The Australian Outback 2 Herbert River at Goshen Station, Queensland, A... Australia Two tribes of eight new players Tina Wesson 32 Tina Colby Donaldson 4-3 Thursday 8:00 pm 2001-01-28 2001-05-03 2000-10-23 2000-12-03 45.37 36.35 28.01 29.80 1.0
2 Survivor: Africa 3 Shaba National Reserve, Kenya Kenya Two tribes of eight new players Ethan Zohn 48 Ethan Kim Johnson 5-2 Thursday 8:00 pm 2001-10-11 2002-01-10 2001-07-11 2001-08-18 23.84 27.26 19.05 20.69 8.0
3 Survivor: Marquesas 4 Nuku Hiva, Marquesas Islands, French Polynesia Polynesia Two tribes of eight new players Vecepia Towery 64 Vecepia Neleh Dennis 4-3 Thursday 8:00 pm 2002-02-28 2002-05-19 2001-11-12 2001-12-20 23.19 25.87 19.05 20.77 6.0
4 Survivor: Thailand 5 Ko Tarutao, Satun Province, Thailand Thailand Two tribes of eight new players; picked by the... Brian Heidik 80 Brian Clay Jordan 4-3 Thursday 8:00 pm 2002-09-19 2002-12-19 2002-06-10 2002-07-18 23.05 24.08 20.43 21.21 4.0
In [17]:
for columns in season_palettes.columns:
    season_palettes.columns =  season_palettes.columns.str.lower()
    season_palettes.columns = season_palettes.columns.str.replace(' ','_') 
    
season_palettes.head() 
Out[17]:
season_name season palette
0 Survivor: 41 41 #ABC9E4
1 Survivor: 41 41 #6990A7
2 Survivor: 41 41 #1D4164
3 Survivor: 41 41 #F3A92B
4 Survivor: 41 41 #E8E086
In [18]:
for columns in tribe_colours.columns:
    tribe_colours.columns =  tribe_colours.columns.str.lower()
    tribe_colours.columns = tribe_colours.columns.str.replace(' ','_') 
    
tribe_colours.head() 
Out[18]:
season_name season tribe tribe_colour tribe_status
0 Survivor: Borneo 1 Pagong #FFFF05 original
1 Survivor: Borneo 1 Rattana #7CFC00 merged
2 Survivor: Borneo 1 Tagi #FF9900 original
3 Survivor: The Australian Outback 2 Barramundi #FF6600 merged
4 Survivor: The Australian Outback 2 Kucha #32CCFF original

Q2: Who is the oldest contestant that is still alive? We want to look at their current age and NOT the age they were when they played their season. Select their row from the castaway_details DataFrame and save this as Q2. This should return a DataFrame and the index and missing values should be left as is.

Important: Remember to show your work (i.e. how you found this answer by using Python/Pandas/Numpy). See note in the instructions above. This is true for all of the following questions as well.

In [19]:
### ENTER CODE HERE ###
Q2 = castaway_details[pd.isnull(castaway_details['date_of_death'])]
Q2 = Q2[Q2['date_of_birth'] == Q2['date_of_birth'].min()]
Q2
Out[19]:
castaway_id full_name short_name date_of_birth date_of_death gender race ethnicity occupation personality_type
0 1 Sonja Christopher Sonja 1937-01-28 NaT Female NaN NaN Musician ENFP

Q3: What contestant was the oldest at the time of their season? We want to look at their age at the time of the season and NOT their current age. Select their row from the castaway_details DataFrame and save this as Q3. This should return a DataFrame and the index and missing values should be left as is.

In [20]:
x = castaways.groupby(['season','castaway_id'])['age'].max()
y = x.loc[x.groupby(level=0).idxmax()]
y = y.to_frame().reset_index()
y = y['castaway_id'].to_list()
Q3 = castaway_details[castaway_details['castaway_id'].isin(y)]
print(Q3)
     castaway_id         full_name short_name date_of_birth date_of_death  \
13            14       Rudy Boesch       Rudy    1928-01-20    2019-11-01   
27            28    Rodger Bingham     Rodger    1947-07-05           NaT   
31            32       Tina Wesson       Tina    1960-12-26           NaT   
46            47       Kim Johnson     Kim J.    1944-09-18           NaT   
60            61   Paschal English    Paschal    1945-03-05           NaT   
74            75  Jake Billingsley       Jake    1941-08-21           NaT   
86            87      Roger Sexton      Roger    1946-09-26           NaT   
98            99    Lillian Morris    Lillian    1952-04-03           NaT   
127          128   Scout Cloud Lee      Scout    1944-11-08           NaT   
136          137     Willard Smith    Willard    1947-11-30           NaT   
150          151         Jim Lynch        Jim    1942-01-07           NaT   
175          176     Bruce Kanegai      Bruce    1947-12-17           NaT   
179          180       Terry Deitz      Terry    1959-10-10           NaT   
182          183       Sekou Bunch      Sekou    1960-09-09           NaT   
205          206    Gary Stritesky       Gary    1951-09-16           NaT   
217          218      Yau-Man Chan    Yau-Man    1952-08-26           NaT   
221          222      Steve Morris    Chicken    1959-09-02           NaT   
248          249    Gillian Larson    Gillian    1947-02-24           NaT   
257          258      Randy Bailey      Randy    1959-03-31           NaT   
268          269      Sandy Burgin      Sandy    1955-07-01           NaT   
282          283      Mike Borassi       Mike    1947-03-13           NaT   
303          304     Jimmy Johnson   Jimmy J.    1943-07-16           NaT   
336          337  Phillip Sheppard    Phillip    1958-03-12           NaT   
350          351       Rick Nelson       Rick    1959-10-24           NaT   
363          364    Troy Robertson    Troyzan    1961-07-30           NaT   
365          366        Greg Smith     Tarzan    1947-12-12           NaT   
379          380   Artis Silvester      Artis    1959-04-18           NaT   
385          386    Denise Stapley     Denise    1971-01-01           NaT   
419          420     Trish Hegarty      Trish    1965-10-09           NaT   
429          430    Dale Wentworth       Dale    1959-03-07           NaT   
458          459    Carolyn Rivera    Carolyn    1961-10-07           NaT   
473          474  Joseph Del Campo        Joe    1943-07-04           NaT   
480          481      Paul Wachter       Paul    1963-10-10           NaT   
498          499     Katrina Radke    Katrina    1970-12-17           NaT   
532          533    Angela Perkins     Angela    1974-11-02           NaT   
541          542      Natalie Cole    Natalie    1961-02-26           NaT   
556          557         Reem Daly       Reem    1971-07-08           NaT   
574          575       Tom Laidlaw        Tom    1958-04-15           NaT   
592          593    Heather Aldret    Heather    1969-03-01           NaT   

     gender   race          ethnicity                           occupation  \
13     Male    NaN                NaN                    Retired Navy SEAL   
27     Male    NaN                NaN                       Teacher/Farmer   
31   Female    NaN                NaN  Personal Nurse;Motivational Speaker   
46   Female    NaN                NaN                      Retired Teacher   
60     Male    NaN                NaN                                Judge   
74     Male    NaN                NaN                          Land Broker   
86     Male    NaN                NaN            Construction Company V.P.   
98   Female    NaN                NaN                   Scout Troop Leader   
127  Female    NaN                NaN                              Rancher   
136    Male    NaN                NaN                               Lawyer   
150    Male    NaN                NaN                 Retired Fire Captain   
175    Male  Asian  Japanese American                          Art Teacher   
179    Male    NaN                NaN                                Pilot   
182    Male  Black                NaN                        Jazz Musician   
205    Male    NaN                NaN                    School Bus Driver   
217    Male  Asian   Bornean American                    Computer Engineer   
221    Male    NaN                NaN                       Chicken Farmer   
248  Female    NaN                NaN                        Retired Nurse   
257    Male    NaN                NaN                 Wedding Videographer   
268  Female    NaN                NaN                           Bus Driver   
282    Male    NaN                NaN                        Personal Chef   
303    Male    NaN                NaN                     Former NFL Coach   
336    Male  Black                NaN  Former Federal Agent;Software Sales   
350    Male    NaN                NaN                              Rancher   
363    Male    NaN                NaN                Swimsuit Photographer   
365    Male    NaN                NaN                      Plastic Surgeon   
379    Male  Black                NaN                    Computer Engineer   
385  Female    NaN                NaN                        Sex Therapist   
419    Male    NaN                NaN                      Pilates Trainer   
429    Male    NaN                NaN                               Farmer   
458  Female    NaN                NaN                  Corporate Executive   
473    Male    NaN                NaN                     Former FBI Agent   
480    Male    NaN                NaN                        Boat Mechanic   
498  Female    NaN                NaN                             Olympian   
532  Female    NaN                NaN                         Army Veteran   
541  Female  Black                NaN                       Publishing CEO   
556  Female    NaN                NaN                 Sales Representative   
574    Male    NaN                NaN                    Former NHL Player   
592  Female    NaN                NaN                     Stay-at-home Mom   

    personality_type  
13              ISTJ  
27              ISFJ  
31              ESFJ  
46              ISFJ  
60              ISFJ  
74              ISFJ  
86              ISTJ  
98              INFP  
127             INFJ  
136             INTP  
150             ISTJ  
175             INFP  
179             ESTJ  
182             ESFJ  
205             ISFJ  
217             INTP  
221             ISFP  
248             ENFP  
257             INTJ  
268             ESFP  
282             ESTP  
303             ESFJ  
336             ESTJ  
350             ISFJ  
363             ESFP  
365             INTP  
379             ISTJ  
385             INFJ  
419             ENFJ  
429             ISTJ  
458             ENTP  
473             ISTJ  
480             ESTJ  
498             ENFP  
532             ISTJ  
541             ESTJ  
556             ESTP  
574             ESFJ  
592             ISFJ  

Q4: What contestant played in the most number of seasons? Select their row from the castaway_details DataFrame and save this as Q4. This should return a DataFrame and the index and missing values should be left as is.

In [21]:
combined1 = castaway_details.merge(castaways, how='right', left_on= 'castaway_id', right_on='castaway_id')
combined3 = combined1.groupby(['castaway_id','full_name_x'],as_index=False)[['season']].nunique().sort_values(by = 'season', ascending = False)
combined3 = combined3.reset_index(drop = True)
combined3
most_played = combined3['castaway_id'].head(1).to_list()
Q4 = castaway_details[castaway_details['castaway_id'].isin(most_played)]
print(Q4)
    castaway_id    full_name  short_name date_of_birth date_of_death gender  \
54           55  Rob Mariano  Boston Rob    1975-12-25           NaT   Male   

   race ethnicity           occupation personality_type  
54  NaN       NaN  Construction Worker             ESTJ  

Q5: Create a DataFrame of all the contestants that won their season (aka their final result in the castaways DataFrame was the 'Sole Survivor'). Call this DataFrame sole_survivor. Note that contestants may appear more than one time in this DataFrame if they won more than one season. Make sure that the index goes from 0 to n-1 and that the DataFrame is sorted ascending by season number.

The DataFrame should have the same columns, and the columns should be in the same order, as the castaways DataFrame.

In [22]:
sole_survivor = castaways[castaways['result'] == 'Sole Survivor'].sort_values(by = 'season')
print(sole_survivor)
                                   season_name  season           full_name  \
746                           Survivor: Borneo       1       Richard Hatch   
730           Survivor: The Australian Outback       2         Tina Wesson   
714                           Survivor: Africa       3          Ethan Zohn   
698                        Survivor: Marquesas       4      Vecepia Towery   
682                         Survivor: Thailand       5        Brian Heidik   
666                       Survivor: The Amazon       6       Jenna Morasca   
648                    Survivor: Pearl Islands       7   Sandra Diaz-Twine   
630                        Survivor: All-Stars       8        Amber Brkich   
612                          Survivor: Vanuatu       9     Chris Daugherty   
592                            Survivor: Palau      10         Tom Westman   
574                        Survivor: Guatemala      11    Danni Boatwright   
558                           Survivor: Panama      12     Aras Baskauskas   
538                     Survivor: Cook Islands      13            Yul Kwon   
519                             Survivor: Fiji      14           Earl Cole   
503                            Survivor: China      15         Todd Herzog   
483                       Survivor: Micronesia      16     Parvati Shallow   
465                            Survivor: Gabon      17      Robert Crowley   
449                        Survivor: Tocantins      18    James Thomas Jr.   
429                            Survivor: Samoa      19       Natalie White   
409              Survivor: Heroes vs. Villains      20   Sandra Diaz-Twine   
389                        Survivor: Nicaragua      21           Jud Birza   
369                Survivor: Redemption Island      22         Rob Mariano   
349                    Survivor: South Pacific      23       Sophie Clarke   
331                        Survivor: One World      24        Kim Spradlin   
313                      Survivor: Philippines      25      Denise Stapley   
293                         Survivor: Caramoan      26        John Cochran   
270                  Survivor: Blood vs. Water      27       Tyson Apostol   
252                          Survivor: Cagayan      28        Tony Vlachos   
234                 Survivor: San Juan del Sur      29    Natalie Anderson   
216                     Survivor: Worlds Apart      30       Mike Holloway   
196                         Survivor: Cambodia      31      Jeremy Collins   
178                        Survivor: Kaoh Rong      32  Michele Fitzgerald   
158            Survivor: Millennials vs. Gen X      33          Adam Klein   
138                    Survivor: Game Changers      34        Sarah Lacina   
120  Survivor: Heroes vs. Healers vs. Hustlers      35      Ben Driebergen   
100                     Survivor: Ghost Island      36     Wendell Holland   
80                 Survivor: David vs. Goliath      37         Nick Wilson   
60                Survivor: Edge of Extinction      38     Chris Underwood   
40               Survivor: Island of the Idols      39       Tommy Sheehan   
18                    Survivor: Winners at War      40        Tony Vlachos   
0                                 Survivor: 41      41     Erika Casupanan   

     castaway_id    castaway  age                  city           state  \
746           16     Richard   39               Newport    Rhode Island   
730           32        Tina   40             Knoxville       Tennessee   
714           48       Ethan   27             Lexington   Massachusetts   
698           64     Vecepia   36               Hayward      California   
682           80       Brian   34           Quartz Hill      California   
666           96       Jenna   21           Bridgeville    Pennsylvania   
648          112      Sandra   29            Fort Lewis      Washington   
630           27       Amber   25                Beaver    Pennsylvania   
612          130       Chris   33          South Vienna            Ohio   
592          150         Tom   40              Sayville        New York   
574          166       Danni   29            Tonganoxie          Kansas   
558          182        Aras   24          Santa Monica      California   
538          202         Yul   31             San Mateo      California   
519          221        Earl   35          Santa Monica      California   
503          237        Todd   22        Pleasant Grove            Utah   
483          197     Parvati   25           Los Angeles      California   
465          265         Bob   57        South Portland           Maine   
449          281        J.T.   24                Mobile         Alabama   
429          301     Natalie   26             Van Buren        Arkansas   
409          112      Sandra   35          Fayetteville  North Carolina   
389          321       Fabio   21                Venice      California   
369           55  Boston Rob   34             Pensacola         Florida   
349          353      Sophie   22             Willsboro        New York   
331          371         Kim   29           San Antonio           Texas   
313          386      Denise   41          Cedar Rapids            Iowa   
293          348     Cochran   25            Washington            D.C.   
270          274       Tyson   34                 Provo            Utah   
252          424        Tony   39           Jersey City      New Jersey   
234          442     Natalie   28             Edgewater      New Jersey   
216          460        Mike   38  North Richland Hills           Texas   
196          433      Jeremy   37               Foxboro   Massachusetts   
178          478     Michele   24              Freehold      New Jersey   
158          498        Adam   25         San Francisco      California   
138          414       Sarah   32                Marion            Iowa   
120          516         Ben   34                 Boise           Idaho   
100          536     Wendell   33          Philadelphia    Pennsylvania   
80           556        Nick   27          Williamsburg        Kentucky   
60           559       Chris   25            Greenville  South Carolina   
40           590       Tommy   26            Long Beach        New York   
18           424        Tony   45             Allendale      New Jersey   
0            594       Erika   32               Toronta         Ontario   

    personality_type  episode  day  order         result jury_status  \
746             ENTP       14   39     16  Sole Survivor         NaN   
730             ESFJ       16   42     16  Sole Survivor         NaN   
714             ISFP       15   39     16  Sole Survivor         NaN   
698             ISTJ       15   39     16  Sole Survivor         NaN   
682             ISTP       15   39     16  Sole Survivor         NaN   
666             ISTP       15   39     16  Sole Survivor         NaN   
648             ESTP       15   39     18  Sole Survivor         NaN   
630             ISFP       17   39     18  Sole Survivor         NaN   
612             ENTP       15   39     18  Sole Survivor         NaN   
592             ESTJ       15   39     20  Sole Survivor         NaN   
574             ENFJ       15   39     18  Sole Survivor         NaN   
558             INFP       16   39     16  Sole Survivor         NaN   
538             INTJ       16   39     20  Sole Survivor         NaN   
519             INFJ       15   39     19  Sole Survivor         NaN   
503             ENFP       15   39     16  Sole Survivor         NaN   
483             ENFJ       15   39     20  Sole Survivor         NaN   
465             INTP       14   39     18  Sole Survivor         NaN   
449             ESTP       15   39     16  Sole Survivor         NaN   
429             ISFJ       16   39     20  Sole Survivor         NaN   
409             ESTP       15   39     20  Sole Survivor         NaN   
389             ESFP       16   39     20  Sole Survivor         NaN   
369             ESTJ       15   39     20  Sole Survivor         NaN   
349             INTJ       16   39     20  Sole Survivor         NaN   
331             INFJ       15   39     18  Sole Survivor         NaN   
313             INFJ       15   39     18  Sole Survivor         NaN   
293             INTP       15   39     20  Sole Survivor         NaN   
270             ESTP       15   39     23  Sole Survivor         NaN   
252             ESTP       14   39     18  Sole Survivor         NaN   
234             ESTP       15   39     18  Sole Survivor         NaN   
216             ESTP       15   39     18  Sole Survivor         NaN   
196             ESTJ       15   39     20  Sole Survivor         NaN   
178             ESFP       15   39     18  Sole Survivor         NaN   
158             ENTP       14   39     20  Sole Survivor         NaN   
138             ENTJ       14   39     20  Sole Survivor         NaN   
120             ESFP       14   39     18  Sole Survivor         NaN   
100             INFJ       14   39     20  Sole Survivor         NaN   
80              ENTP       14   39     20  Sole Survivor         NaN   
60              ENTP       14   39     20  Sole Survivor         NaN   
40              ENFJ       14   39     20  Sole Survivor         NaN   
18              ESTP       15   39     22  Sole Survivor         NaN   
0               ENFP       13   26     18  Sole Survivor         NaN   

    original_tribe swapped_tribe swapped_tribe_2  merged_tribe  \
746           Tagi           NaN             NaN       Rattana   
730         Ogakor           NaN             NaN    Barramundi   
714          Boran         Boran             NaN     Moto Maji   
698        Maraamu          Rotu             NaN      Soliantu   
682     Chuay Gahn           NaN             NaN     Chuay Jai   
666         Jaburu        Jaburu             NaN        Jacaré   
648          Drake         Drake             NaN        Balboa   
630        Chapera       Chapera         Chapera  Chaboga Mogo   
612         Lopevi        Lopevi             NaN        Alinta   
592          Koror           NaN             NaN         Koror   
574          Nakúm         Yaxhá             NaN        Xhakúm   
558        Viveros        Casaya             NaN       Gitanos   
538      Puka Puka      Aitutaki        Aitutaki     Aitutonga   
519           Ravu          Moto             NaN     Bula Bula   
503       Fei Long      Fei Long             NaN   Hae Da Fung   
483        Malakal         Airai             NaN          Dabu   
465           Kota          Kota            Kota         Nobag   
449        Jalapao           NaN             NaN         Forza   
429        Foa Foa           NaN             NaN          Aiga   
409       Villains           NaN             NaN      Yin Yang   
389        La Flor       La Flor             NaN      Libertad   
369        Ometepe           NaN             NaN      Murlonio   
349          Upolu           NaN             NaN       Te Tuna   
331         Salani        Salani             NaN       Tikiano   
313        Matsing       Kalabaw             NaN     Dangrayne   
293          Bikal         Bikal             NaN     Enil Edam   
270         Galang       Tadhana             NaN        Kasama   
252         Aparri        Solana             NaN     Solarrion   
234        Hunahpu       Hunahpu             NaN        Huyopa   
216       Escameca      Escameca             NaN        Merica   
196          Bayon         Bayon           Bayon         Orkun   
178         Gondol      Chan Loh             NaN          Dara   
158          Vanua        Takali             NaN        Vinaka   
138           Nuku         Tavua             NaN     Maku Maku   
120           Levu          Yawa             NaN        Solewa   
100         Naviti        Naviti          Yanuya        Lavita   
80           David        Jabeni             NaN      Kalokalo   
60            Manu           NaN             NaN          Vata   
40           Vokai         Vokai             NaN      Lumuwaku   
18           Dakal         Dakal             NaN          Koru   
0             Luvu           NaN             NaN      Via Kana   

     total_votes_received  immunity_idols_won  
746                     6                   4  
730                     0                   2  
714                     0                   4  
698                     2                   4  
682                     0                   8  
666                     3                   7  
648                     0                   3  
630                     6                   6  
612                     3                   6  
592                     0                  12  
574                     1                   5  
558                     9                   4  
538                     5                   6  
519                     1                   5  
503                     5                   4  
483                     4                   7  
465                     2                   8  
449                     0                   5  
429                     8                   1  
409                     3                   4  
389                     2                   8  
369                     7                   9  
349                     5                   6  
331                     3                   6  
313                     6                   1  
293                     0                   7  
270                     2                   8  
252                     5                   6  
234                     0                   6  
216                     4                   7  
196                     3                   7  
178                     1                   7  
158                     6                   6  
138                     0                   4  
120                    11                   4  
100                     5                   8  
80                      0                   4  
60                      9                   1  
40                      2                   6  
18                      0                   9  
0                       2                   8  

Q6: Have any contestants won more than one time? If so, select their records from the sole_survivor DataFrame, sorting the rows by season. Save this as Q6. If no contestant has won twice, save Q6 as the string None.

In [23]:
sole_survivor.groupby('full_name', as_index=False)['season'].count().sort_values(by = 'season', ascending = False)
Q6 = sole_survivor[(sole_survivor['full_name'] == 'Sandra Diaz-Twine') | (sole_survivor['full_name'] == 'Tony Vlachos')].sort_values(by = 'season')
print(Q6)
                       season_name  season          full_name  castaway_id  \
648        Survivor: Pearl Islands       7  Sandra Diaz-Twine          112   
409  Survivor: Heroes vs. Villains      20  Sandra Diaz-Twine          112   
252              Survivor: Cagayan      28       Tony Vlachos          424   
18        Survivor: Winners at War      40       Tony Vlachos          424   

    castaway  age          city           state personality_type  episode  \
648   Sandra   29    Fort Lewis      Washington             ESTP       15   
409   Sandra   35  Fayetteville  North Carolina             ESTP       15   
252     Tony   39   Jersey City      New Jersey             ESTP       14   
18      Tony   45     Allendale      New Jersey             ESTP       15   

     day  order         result jury_status original_tribe swapped_tribe  \
648   39     18  Sole Survivor         NaN          Drake         Drake   
409   39     20  Sole Survivor         NaN       Villains           NaN   
252   39     18  Sole Survivor         NaN         Aparri        Solana   
18    39     22  Sole Survivor         NaN          Dakal         Dakal   

    swapped_tribe_2 merged_tribe  total_votes_received  immunity_idols_won  
648             NaN       Balboa                     0                   3  
409             NaN     Yin Yang                     3                   4  
252             NaN    Solarrion                     5                   6  
18              NaN         Koru                     0                   9  

Q7: Using value_counts(), what is the normalized relative frequencies (percentage) breakdown of gender for all the contestants that have played before? Count someone who played in multiple seasons only once. Round the results to 3 decimal places.

In [24]:
Q7 = castaway_details[castaway_details['full_name'] == castaway_details['full_name'].unique()]
Q7 = round(Q7['gender'].value_counts(normalize='True'),3)
print(Q7)
Male          0.502
Female        0.497
Non-binary    0.002
Name: gender, dtype: float64

Q8:

  • What percentage of times has a male won his season? Save this percentage as Q8A.
  • What percentage of time has a female won her season? Save this percentage as Q8B.
  • Note: Round all percentages to two decimal points and write as a float (example: 55.57).
  • Note 2: If a contestant has won twice, count each win separately.
In [25]:
combined4 = castaway_details.merge(sole_survivor, how='inner', left_on= 'castaway_id', right_on='castaway_id')
In [26]:
combined4['gender'].value_counts()
Out[26]:
Male      25
Female    16
Name: gender, dtype: int64
In [27]:
Q8A = round(25/41*100,2)
Q8B = round(16/41*100,2)
print(Q8A,Q8B)
60.98 39.02

Q9: What is the average age of contestants when they appeared on the show? Save this as Q9. Round to nearest integer.

In [28]:
Q9 = round(castaways['age'].mean(),0)
Q9 = int(Q9)
print(Q9)
33

Q10: Let's say we wanted to analyze the types of occupations that make a good winner on Survivor. Create a DataFrame that includes the occupation of every winner (sorted by season). If a contestant won more than one time, the occupation should only appear once for the first time the contestant was on the show. The DataFrame index will be the respective season number and should be called Q10.

Code Check: The first five rows of the Q10 DataFrame should look similar to this:

season occupation
1 Corporate Trainer
2 Personal Nurse; Motivational Speaker
3 Professional Soccer Player;Social Entrepreneur...
4 Office Manager
5 Used Car Salesman
In [62]:
Q10 = combined4.groupby('castaway_id',as_index = False)[['season','occupation']].min()
Q10 = Q10[['season','occupation']].sort_values(by='season').reset_index(drop=True)
print(Q10)
    season                                         occupation
0        1                                  Corporate Trainer
1        2                Personal Nurse;Motivational Speaker
2        3  Professional Soccer Player;Social Entrepreneur...
3        4                                     Office Manager
4        5                                  Used Car Salesman
5        6                                     Swimsuit Model
6        7                      Office Assistant;Case Manager
7        8  Administrative Assistant;Director of Marketing...
8        9                        Highway Construction Worker
9       10               NYC Firefighter;Motivational Speaker
10      11           Sports Radio Host;Owner of Sideline Chic
11      12                           Yoga Instructor;Musician
12      13           Management Consultant;Product Management
13      14                                       Ad Executive
14      15                                   Flight Attendant
15      16  Boxer;Charity Organizer;Yoga Teacher/Life Coac...
16      17                                    Physics Teacher
17      18                                     Cattle Rancher
18      19                               Pharmaceutical Sales
19      21                                            Student
20      22                                Construction Worker
21      23              Medical Student;Healthcare Consultant
22      24                Bridal Shop Owner;Interior Designer
23      25                                      Sex Therapist
24      26                                Harvard Law Student
25      27  Former Professional Cyclist/Missionary;Shop Ma...
26      28                                     Police Officer
27      29  Crossfit Coach/Physical Therapy Student;CrossF...
28      30                                        Oil Driller
29      31                              Cambridge Firefighter
30      32             Bartender;Business Development Manager
31      33  Homeless Shelter Manager;Keynote Speaker and Host
32      34                                     Police Officer
33      35                Marine;Real Estate/Stay-at-home Dad
34      36         Furniture Company Owner;Furniture Designer
35      37                           Public Defender;Attorney
36      38                             District Sales Manager
37      39                                  4th Grade Teacher
38      41                             Communications Manager

Q11: Who played the most total number of days of Survivor? If a contestant appeared on more than one season, you would add their total days for each season together. Save the top five contestants in terms of total days played as a DataFrame and call it Q11, sorted in descending order by total days played.

The following columns should be included: castaway_id, full_name, and total_days_played where total_days_played is the sum of all days a contestant played. The index should go from 0 to n-1.

Note: Be careful because on some seasons, the contestant was allowed to come back into the game after being voted off. Take a look at Season 23's contestant Oscar Lusth in the castaways DataFrame as an example. He was voted out 7th and then returned to the game. He was then voted out 9th and returned to the game a second time. He was then voted out 17th the final time. Be aware of this in your calculations and make sure you are counting the days according to the last time they were voted off or won.

In [30]:
Q11 = castaways.groupby(['season','castaway_id','full_name'],as_index=False)['day'].max().sort_values(by='day',ascending=False)
Q11 = Q11.groupby(['castaway_id','full_name'],as_index=False)['day'].sum().sort_values(by='day',ascending=False)
Q11 = Q11.rename(columns={'day':'total_days_played'})
print(Q11)
     castaway_id          full_name  total_days_played
55            55        Rob Mariano                131
198          197    Parvati Shallow                130
202          201        Oscar Lusth                128
179          179       Cirie Fields                121
112          112  Sandra Diaz-Twine                110
..           ...                ...                ...
133          133      Jolanda Jones                  3
0              1  Sonja Christopher                  3
131          131     Jonathan Libby                  2
132          132        Wanda Shirk                  2
195          195       Candice Cody                  1

[611 rows x 3 columns]

Q12A & Q12B: What is the percentage of total extroverts and introverts that have played the game (count players only once even if they have played in more than one season). Save these percentages as Q12A and Q12B respectively. Note: Round all percentages to two decimal points and write as a float (example: 55.57).

For more information on personality types check this Wikipedia article.

In [31]:
total = combined1.groupby(['castaway_id','personality_type_x'],as_index=False)['season'].sum()
count_total = total['castaway_id'].count()
extroverts = total[total['personality_type_x'].str.contains('E')]
count_extroverts = extroverts['personality_type_x'].count()
Q12A = round(count_extroverts/count_total*100,2)
print(Q12A)
53.63
In [32]:
Q12B = 100 - Q12A
print(Q12B)
46.37

Q13A & Q13B: Now that we know the percentages of total players that are extroverted and introverted, let's see if that made a difference in terms of who actually won their season.¶

What is the percentage of total extroverts and introverts that have won the game (count players only once even if they have won more than one season)? Save these percentages as Q13A and Q13B respectively. Note: Round all percentages to two decimal points and write as a float (example: 55.57).

In [33]:
count_total = combined4['castaway_id'].nunique()
extroverts = combined4[combined4['personality_type_x'].str.contains('E')]
count_extroverts = extroverts['castaway_id'].nunique()
Q13A = round(count_extroverts/count_total*100,2)
print(Q13A)
61.54
In [34]:
Q13B = 100 - Q13A
print(Q13B)
38.46

Q14: Which contestants have never received a tribal council vote (i.e. a vote to be voted out of the game as shown in the vote_id column in the vote_history DataFrame)? Note that there are various reasons for a contestant to not receive a tribal vote: they quit, made it to the end, medical emergency, etc. Select their rows from the castaway_details DataFrame and save this as Q14 in ascending order by castaway_id. This should return a DataFrame and the index and missing values should be left as is.

In [35]:
not_voted = castaways.groupby('castaway_id',as_index=False)['total_votes_received'].sum()
not_voted = not_voted[not_voted['total_votes_received'] == 0]
not_voted_list = not_voted['castaway_id'].to_list()
Q14 = castaway_details[castaway_details['castaway_id'].isin(not_voted_list)]
print(Q14)
     castaway_id          full_name    short_name date_of_birth date_of_death  \
21            22     Michael Skupin       Michael    1962-01-29           NaT   
60            61    Paschal English       Paschal    1945-03-05           NaT   
79            80       Brian Heidik         Brian    1968-03-09           NaT   
130          131     Jonathan Libby      Jonathan    1981-09-05           NaT   
131          132        Wanda Shirk         Wanda    1949-08-24           NaT   
205          206     Gary Stritesky          Gary    1951-09-16           NaT   
241          242  Kathleen Sleckman         Kathy    1962-08-14           NaT   
282          283       Mike Borassi          Mike    1947-03-13           NaT   
313          314        Kelly Shinn  Purple Kelly    1990-05-05           NaT   
353          354      Kourtney Moon      Kourtney    1982-02-27           NaT   
374          375       Dana Lambert          Dana    1979-12-13           NaT   
384          385      Lisa Whelchel          Lisa    1963-05-29           NaT   
411          412       Lindsey Ogle       Lindsey    1983-09-04           NaT   
467          468      Neal Gottlieb          Neal    1977-02-03           NaT   
536          537         Pat Cusack           Pat    1977-02-25           NaT   
539          540          Bi Nguyen            Bi    1989-10-30           NaT   
552          553           Kara Kay          Kara    1987-11-20           NaT   
569          570      Gavin Whitson         Gavin    1994-07-20           NaT   

     gender   race ethnicity  \
21     Male    NaN       NaN   
60     Male    NaN       NaN   
79     Male    NaN       NaN   
130    Male    NaN       NaN   
131  Female    NaN       NaN   
205    Male    NaN       NaN   
241  Female    NaN       NaN   
282    Male    NaN       NaN   
313  Female    NaN       NaN   
353  Female    NaN       NaN   
374  Female    NaN       NaN   
384  Female    NaN       NaN   
411  Female    NaN       NaN   
467    Male    NaN       NaN   
536    Male    NaN       NaN   
539  Female  Asian       NaN   
552  Female    NaN       NaN   
569    Male    NaN       NaN   

                                            occupation personality_type  
21   Software Publisher;Part-Time Professional Speaker             ESFJ  
60                                               Judge             ISFJ  
79                                   Used Car Salesman             ISTP  
130                        Sales & Marketing Associate             ISTP  
131                                    English Teacher             ENFP  
205                                  School Bus Driver             ISFJ  
241                                 Golf Course Vendor             ENFP  
282                                      Personal Chef             ESTP  
313                                    Nursing Student             ENFP  
353                                  Motorcycle Repair             ISFP  
374                                      Cosmetologist             ISTP  
384                                Former TV Teen Star             INFP  
411                                        Hairstylist             ESFP  
467                             Ice Cream Entrepreneur             INTP  
536                                Maintenance Manager             ESTP  
539                                        MMA Fighter             ISFP  
552                                            Realtor             ENFJ  
569                              YMCA Program Director             ISFJ  

Q15: What contestant has won the most number of challenges? Select their row from the castaway_details DataFrame and save this as Q15. This should return a DataFrame and the index and missing values should be left as is.

In [36]:
x = challenge_results.groupby(['winner_id','winner'],as_index = False)['winning_tribe'].count().sort_values(by = 'winning_tribe',ascending=False)
combined6 = tribe_mapping.merge(x, how='right', left_on= 'castaway', right_on='winner')
most_wins = combined6.groupby('castaway_id',as_index=False)['winning_tribe'].max().sort_values(by='winning_tribe',ascending=False).head(1)
most_wins = most_wins['castaway_id'].to_list()
Q15 = castaway_details[castaway_details['castaway_id'].isin(most_wins)]
print(Q15)
     castaway_id    full_name short_name date_of_birth date_of_death gender  \
200          201  Oscar Lusth       Ozzy    1981-08-23           NaT   Male   

                 race           ethnicity           occupation  \
200  Mexican American  Hispanic or Latino  Waiter;Photographer   

    personality_type  
200             ISFP  

Q16: What challenge has been played the most number of times in all seasons? To make it easier, a challenge should only count one time for each season it was played even if it was played more than once during a season (although I don't think that is the case for any of these challenges). Select their row from the challenge_description DataFrame and save this as Q16. This should return a DataFrame and the index and missing values should be left as is.

In [37]:
challenge_results['challenge_id'].value_counts()
Out[37]:
CH0631    157
CH0004     84
CH0388     75
CH0765     73
CH0430     70
         ... 
CH0462      1
CH0764      1
CH0695      1
CH0361      1
CH0606      1
Name: challenge_id, Length: 506, dtype: int64
In [38]:
Q16 = challenge_description[challenge_description['challenge_id'] == 'CH0631']
print(Q16)
    challenge_id           challenge_name  puzzle  race  precision  endurance  \
548       CH0631  Blind Leading the Blind   False  True      False      False   

     strength  turn_based  balance   food  knowledge  memory   fire  water  
548     False       False    False  False      False   False  False  False  

Q17: Let's see if the use of hidden immunity idols has increased or decreased over the seasons. Create a Series of the number of hidden idols held per season. The season number should be the index and the values should be the sum of the number of idols that were held. Save this as Q17, sorted by season in ascending order.

In [39]:
Q17 = hidden_idols.notna()
Q17 = hidden_idols.groupby('season')['idols_held'].sum()
print(Q17)
season
11     1
12     1
13     1
14     4
15     3
16     4
17     4
18     3
19     4
20    11
21     4
22     3
23     2
24     3
25     3
26     7
27     3
28     6
29     5
30     3
31     4
32     5
33     7
34     8
35    10
36     9
37     7
38     8
39    13
40    10
Name: idols_held, dtype: int64

Q18: Which contestant held the most number of hidden immunity idols in a single season? Select their row from the castaway_details DataFrame and save this as Q18. This should return a DataFrame and the index and missing values should be left as is.

In [40]:
hidden_idols.groupby(['season','castaway_id'],as_index = False)['idols_held'].sum().sort_values(by = 'idols_held',ascending = False).head(1)
Out[40]:
season castaway_id idols_held
102 38 560 4
In [41]:
Q18 = castaway_details[castaway_details['castaway_id'] == 560]
print(Q18)
     castaway_id    full_name short_name date_of_birth date_of_death gender  \
559          560  Rick Devens       Rick    1984-04-05           NaT   Male   

    race ethnicity           occupation personality_type  
559  NaN       NaN  Morning News Anchor             ENTP  

Q19: What was the largest number of days between when a hidden immunity idol was found and played. Don't count instances with missing values in days found or the days played column. Save the largest number of days as Q19 (as an int).

In [42]:
highest_days = hidden_idols[hidden_idols['day_found'].notna()]
highest_days = hidden_idols[hidden_idols['day_played'].notna()]
highest_days = highest_days.copy()
highest_days['difference'] = hidden_idols['day_played'] - hidden_idols['day_found']
Q19 = highest_days['difference'].max()
Q19 = int(Q19)
print(Q19)
33

Q20: Let's find out which finalist received zero votes from the jury (remember the jury votes are good -- you want jury votes to win the game but these players did not receive any votes).

  • First create a multi-index Series that sums the votes listed in the jury_votes DataFrame grouped by season, then finalist id.
  • From this Series, select only the ones where the sum of the votes are zero.
  • This should be sorted by season in ascending order. Save this as Q20.
In [43]:
votes = jury_votes.groupby(['season','finalist_id'],as_index=False)['vote'].sum()
votes = votes[votes['vote'] == 0]
Q20 = votes.groupby(['season','finalist_id'])['vote'].sum()
print(Q20)
season  finalist_id
13      200            0
14      219            0
        220            0
17      263            0
18      280            0
19      299            0
20      300            0
21      319            0
22      336            0
23      352            0
24      369            0
26      346            0
        396            0
27      10             0
31      419            0
        421            0
32      476            0
33      496            0
        497            0
34      364            0
36      534            0
37      554            0
38      569            0
39      588            0
40      478            0
41      597            0
Name: vote, dtype: int64

Q21: Let's see how many winners ended up getting unanimous jury votes to win the game. Create a Dataframe that shows the survivors that got unanimous jury votes with these columns in the final output: season, season_name, winner_id, full_name. The DataFrame should be sorted by season and the index should go from 0 to n-1. Save this as Q21.

In [44]:
unanimous_list = ['10-0-0','7-0','8-0-0','9-0-0']
Q21 = season_summary[season_summary['final_vote'].isin(unanimous_list)]
Q21 = Q21[['season','season_name','winner_id','full_name']].reset_index(drop=True)
print(Q21)
   season                      season_name  winner_id         full_name
0      14                   Survivor: Fiji        221         Earl Cole
1      18              Survivor: Tocantins        281  James Thomas Jr.
2      26               Survivor: Caramoan        348      John Cochran
3      31               Survivor: Cambodia        433    Jeremy Collins
4      33  Survivor: Millennials vs. Gen X        498        Adam Klein

Q22: Sometimes a contestant might win the game even though they have a lot of other contestants trying to eliminate them. What survivor that won their season had the most votes against them to get voted out during the season (represented as "total_votes_received" from the sole_survivor DataFrame). Select their row from the castaway_details DataFrame and save this as Q22. This should return a DataFrame and the index and missing values should be left as is.

In [45]:
### ENTER CODE HERE ###
combined4.groupby('castaway_id',as_index=False)['total_votes_received'].max().sort_values(by='total_votes_received',ascending=False).head(1)
Out[45]:
castaway_id total_votes_received
33 516 11
In [46]:
Q22 = castaway_details[castaway_details['castaway_id'] == 516]
print(Q22)
     castaway_id       full_name short_name date_of_birth date_of_death  \
515          516  Ben Driebergen        Ben    1983-01-01           NaT   

    gender race ethnicity                           occupation  \
515   Male  NaN       NaN  Marine;Real Estate/Stay-at-home Dad   

    personality_type  
515             ESFP  

Q23: Let's see how many times each country was used as a location. Create a Series where the country name is the index and the total number of times a season was played in that country are the values. Sort in descending order and call this Q23.

In [47]:
Q23 = season_summary.groupby('country')['season'].count()
print(Q23)
country
Australia       1
Brazil          2
Cambodia        2
China           1
Fiji           10
Gabon           1
Guatemala       1
Islands         1
Kenya           1
Malaysia        1
Nicaragua       6
Palau           2
Panama          3
Philippines     4
Polynesia       1
Samoa           2
Thailand        1
Vanuatu         1
Name: season, dtype: int64

Q24: For the castaway_details DataFrame, there is a full_name column and a short_name column. It would be helpful for future analysis to have the contestants first and last name split into separate columns. First copy the castaway_details DataFrame to a new DataFrame called Q24 so that we do not change the original DataFrame.

Create two new columns and add the contestant's first name to a new column called first_name and their last name to a new column called last_name. Add these columns to the Q24 DataFrame. Put the first_name and last_name columns between the full_name and short_name columns.

Note: Be careful as some players have last names with multiple spaces. For example, Lex van den Berghe. You should code Lex as his first name and van den Berghe as his last name.

In [48]:
Q24 = castaway_details.copy()
Q24[['first_name', 'last_name']] = Q24['full_name'].str.split(" ", 1, expand=True)
Q24 = Q24[['castaway_id', 'full_name', 'first_name', 'last_name', 'short_name','date_of_birth','date_of_death','gender','race'
          ,'ethnicity','occupation','personality_type']]
print(Q24)
     castaway_id          full_name first_name    last_name short_name  \
0              1  Sonja Christopher      Sonja  Christopher      Sonja   
1              2      B.B. Anderson       B.B.     Anderson       B.B.   
2              3    Stacey Stillman     Stacey     Stillman     Stacey   
3              4        Ramona Gray     Ramona         Gray     Ramona   
4              5          Dirk Been       Dirk         Been       Dirk   
..           ...                ...        ...          ...        ...   
603          604      Tiffany Seely    Tiffany        Seely    Tiffany   
604          605       Sydney Segal     Sydney        Segal     Sydney   
605          606      Shantel Smith    Shantel        Smith       Shan   
606          607         David Voce      David         Voce       Voce   
607          608      Liana Wallace      Liana      Wallace      Liana   

    date_of_birth date_of_death  gender   race ethnicity  \
0      1937-01-28           NaT  Female    NaN       NaN   
1      1936-01-18    2013-10-29    Male    NaN       NaN   
2      1972-08-11           NaT  Female    NaN       NaN   
3      1971-01-20           NaT  Female  Black       NaN   
4      1976-06-15           NaT    Male    NaN       NaN   
..            ...           ...     ...    ...       ...   
603    1973-12-08           NaT  Female  White    Jewish   
604    1995-07-19           NaT  Female  White    Jewish   
605    1987-03-11           NaT  Female  Black       NaN   
606    1986-05-01           NaT    Male    NaN       NaN   
607    2000-10-25           NaT  Female  Black    Jewish   

                occupation personality_type  
0                 Musician             ENFP  
1    Real Estate Developer             ESTJ  
2                 Attorney             ENTJ  
3       Biochemist/Chemist             ISTJ  
4             Dairy Farmer             ISFP  
..                     ...              ...  
603                Teacher             ENTP  
604            Law Student             ESTP  
605                 Pastor             ENFJ  
606           Neurosurgeon             ENTJ  
607        College Student             ESTJ  

[608 rows x 12 columns]
In [49]:
Q24[Q24['full_name'] == 'Lex van den Berghe']
Out[49]:
castaway_id full_name first_name last_name short_name date_of_birth date_of_death gender race ethnicity occupation personality_type
45 46 Lex van den Berghe Lex van den Berghe Lex 1963-06-18 NaT Male NaN NaN Marketing Manager ENTJ

Q25: Let's say that we have a theory that contestants that stay in their original tribe when the tribes are swapped have a better chance at winning their season.

  • First, from the sole_survivor DataFrame that you created earlier, drop any castaways that have a missing value in the "swapped tribe" column. Use a copy of the data so that you do not drop rows from the original DataFrame.
  • Then, select all the winners where their swapped tribe (swapped tribe #1) was the same as their original tribe. Save this as Q25A.
  • Then select all the winners where their swapped tribe was different than their original tribe. Call this Q25B.

Note: In reality, it is more complicated than this as you would really want to see how many contestants ended up with the majority of their original tribe on the swapped tribe regardless of whether their tribe stayed the same. You are welcome to research this on your own for further practice.

Remember: Don't change any of the original DataFrames or CodeGrade will not work correctly for this assignment. Make sure you use copy() if needed.

In [50]:
cond = sole_survivor.copy()
cond = cond[cond['swapped_tribe'].notna()]
Q25A = cond[cond['swapped_tribe'] == cond['original_tribe']]
print(Q25A)
                       season_name  season          full_name  castaway_id  \
714               Survivor: Africa       3         Ethan Zohn           48   
666           Survivor: The Amazon       6      Jenna Morasca           96   
648        Survivor: Pearl Islands       7  Sandra Diaz-Twine          112   
630            Survivor: All-Stars       8       Amber Brkich           27   
612              Survivor: Vanuatu       9    Chris Daugherty          130   
503                Survivor: China      15        Todd Herzog          237   
465                Survivor: Gabon      17     Robert Crowley          265   
389            Survivor: Nicaragua      21          Jud Birza          321   
331            Survivor: One World      24       Kim Spradlin          371   
293             Survivor: Caramoan      26       John Cochran          348   
234     Survivor: San Juan del Sur      29   Natalie Anderson          442   
216         Survivor: Worlds Apart      30      Mike Holloway          460   
196             Survivor: Cambodia      31     Jeremy Collins          433   
100         Survivor: Ghost Island      36    Wendell Holland          536   
40   Survivor: Island of the Idols      39      Tommy Sheehan          590   
18        Survivor: Winners at War      40       Tony Vlachos          424   

    castaway  age                  city          state personality_type  \
714    Ethan   27             Lexington  Massachusetts             ISFP   
666    Jenna   21           Bridgeville   Pennsylvania             ISTP   
648   Sandra   29            Fort Lewis     Washington             ESTP   
630    Amber   25                Beaver   Pennsylvania             ISFP   
612    Chris   33          South Vienna           Ohio             ENTP   
503     Todd   22        Pleasant Grove           Utah             ENFP   
465      Bob   57        South Portland          Maine             INTP   
389    Fabio   21                Venice     California             ESFP   
331      Kim   29           San Antonio          Texas             INFJ   
293  Cochran   25            Washington           D.C.             INTP   
234  Natalie   28             Edgewater     New Jersey             ESTP   
216     Mike   38  North Richland Hills          Texas             ESTP   
196   Jeremy   37               Foxboro  Massachusetts             ESTJ   
100  Wendell   33          Philadelphia   Pennsylvania             INFJ   
40     Tommy   26            Long Beach       New York             ENFJ   
18      Tony   45             Allendale     New Jersey             ESTP   

     episode  day  order         result jury_status original_tribe  \
714       15   39     16  Sole Survivor         NaN          Boran   
666       15   39     16  Sole Survivor         NaN         Jaburu   
648       15   39     18  Sole Survivor         NaN          Drake   
630       17   39     18  Sole Survivor         NaN        Chapera   
612       15   39     18  Sole Survivor         NaN         Lopevi   
503       15   39     16  Sole Survivor         NaN       Fei Long   
465       14   39     18  Sole Survivor         NaN           Kota   
389       16   39     20  Sole Survivor         NaN        La Flor   
331       15   39     18  Sole Survivor         NaN         Salani   
293       15   39     20  Sole Survivor         NaN          Bikal   
234       15   39     18  Sole Survivor         NaN        Hunahpu   
216       15   39     18  Sole Survivor         NaN       Escameca   
196       15   39     20  Sole Survivor         NaN          Bayon   
100       14   39     20  Sole Survivor         NaN         Naviti   
40        14   39     20  Sole Survivor         NaN          Vokai   
18        15   39     22  Sole Survivor         NaN          Dakal   

    swapped_tribe swapped_tribe_2  merged_tribe  total_votes_received  \
714         Boran             NaN     Moto Maji                     0   
666        Jaburu             NaN        Jacaré                     3   
648         Drake             NaN        Balboa                     0   
630       Chapera         Chapera  Chaboga Mogo                     6   
612        Lopevi             NaN        Alinta                     3   
503      Fei Long             NaN   Hae Da Fung                     5   
465          Kota            Kota         Nobag                     2   
389       La Flor             NaN      Libertad                     2   
331        Salani             NaN       Tikiano                     3   
293         Bikal             NaN     Enil Edam                     0   
234       Hunahpu             NaN        Huyopa                     0   
216      Escameca             NaN        Merica                     4   
196         Bayon           Bayon         Orkun                     3   
100        Naviti          Yanuya        Lavita                     5   
40          Vokai             NaN      Lumuwaku                     2   
18          Dakal             NaN          Koru                     0   

     immunity_idols_won  
714                   4  
666                   7  
648                   3  
630                   6  
612                   6  
503                   4  
465                   8  
389                   8  
331                   6  
293                   7  
234                   6  
216                   7  
196                   7  
100                   8  
40                    6  
18                    9  
In [51]:
Q25B = cond[cond['swapped_tribe'] != cond['original_tribe']]
print(Q25B)
                                   season_name  season           full_name  \
698                        Survivor: Marquesas       4      Vecepia Towery   
574                        Survivor: Guatemala      11    Danni Boatwright   
558                           Survivor: Panama      12     Aras Baskauskas   
538                     Survivor: Cook Islands      13            Yul Kwon   
519                             Survivor: Fiji      14           Earl Cole   
483                       Survivor: Micronesia      16     Parvati Shallow   
313                      Survivor: Philippines      25      Denise Stapley   
270                  Survivor: Blood vs. Water      27       Tyson Apostol   
252                          Survivor: Cagayan      28        Tony Vlachos   
178                        Survivor: Kaoh Rong      32  Michele Fitzgerald   
158            Survivor: Millennials vs. Gen X      33          Adam Klein   
138                    Survivor: Game Changers      34        Sarah Lacina   
120  Survivor: Heroes vs. Healers vs. Hustlers      35      Ben Driebergen   
80                 Survivor: David vs. Goliath      37         Nick Wilson   

     castaway_id castaway  age           city       state personality_type  \
698           64  Vecepia   36        Hayward  California             ISTJ   
574          166    Danni   29     Tonganoxie      Kansas             ENFJ   
558          182     Aras   24   Santa Monica  California             INFP   
538          202      Yul   31      San Mateo  California             INTJ   
519          221     Earl   35   Santa Monica  California             INFJ   
483          197  Parvati   25    Los Angeles  California             ENFJ   
313          386   Denise   41   Cedar Rapids        Iowa             INFJ   
270          274    Tyson   34          Provo        Utah             ESTP   
252          424     Tony   39    Jersey City  New Jersey             ESTP   
178          478  Michele   24       Freehold  New Jersey             ESFP   
158          498     Adam   25  San Francisco  California             ENTP   
138          414    Sarah   32         Marion        Iowa             ENTJ   
120          516      Ben   34          Boise       Idaho             ESFP   
80           556     Nick   27   Williamsburg    Kentucky             ENTP   

     episode  day  order         result jury_status original_tribe  \
698       15   39     16  Sole Survivor         NaN        Maraamu   
574       15   39     18  Sole Survivor         NaN          Nakúm   
558       16   39     16  Sole Survivor         NaN        Viveros   
538       16   39     20  Sole Survivor         NaN      Puka Puka   
519       15   39     19  Sole Survivor         NaN           Ravu   
483       15   39     20  Sole Survivor         NaN        Malakal   
313       15   39     18  Sole Survivor         NaN        Matsing   
270       15   39     23  Sole Survivor         NaN         Galang   
252       14   39     18  Sole Survivor         NaN         Aparri   
178       15   39     18  Sole Survivor         NaN         Gondol   
158       14   39     20  Sole Survivor         NaN          Vanua   
138       14   39     20  Sole Survivor         NaN           Nuku   
120       14   39     18  Sole Survivor         NaN           Levu   
80        14   39     20  Sole Survivor         NaN          David   

    swapped_tribe swapped_tribe_2 merged_tribe  total_votes_received  \
698          Rotu             NaN     Soliantu                     2   
574         Yaxhá             NaN       Xhakúm                     1   
558        Casaya             NaN      Gitanos                     9   
538      Aitutaki        Aitutaki    Aitutonga                     5   
519          Moto             NaN    Bula Bula                     1   
483         Airai             NaN         Dabu                     4   
313       Kalabaw             NaN    Dangrayne                     6   
270       Tadhana             NaN       Kasama                     2   
252        Solana             NaN    Solarrion                     5   
178      Chan Loh             NaN         Dara                     1   
158        Takali             NaN       Vinaka                     6   
138         Tavua             NaN    Maku Maku                     0   
120          Yawa             NaN       Solewa                    11   
80         Jabeni             NaN     Kalokalo                     0   

     immunity_idols_won  
698                   4  
574                   5  
558                   4  
538                   6  
519                   5  
483                   7  
313                   1  
270                   8  
252                   6  
178                   7  
158                   6  
138                   4  
120                   4  
80                    4  

Q26: Let's say that we wanted to predict a contestants personality type based on the information in the data files. Your task is to create a DataFrame that lists the castaway_id, full_name and personality_type for each castaway contestant. However, since most machine learning algorithms use numeric data, you want to change the personality types to the following numbers:

  • ISTJ - 1
  • ISTP - 2
  • ISFJ - 3
  • ISFP - 4
  • INFJ - 5
  • INFP - 6
  • INTJ - 7
  • INTP - 8
  • ESTP - 9
  • ESTJ - 10
  • ESFP - 11
  • ESFJ - 12
  • ENFP - 13
  • ENFJ - 14
  • ENTP - 15
  • ENTJ - 16
  • Missing values - 17

Save this new DataFrame as Q26 and sort based on castaway_id in ascending order.

In [52]:
Q26 = castaway_details.copy()
Q26 = Q26[['castaway_id','full_name','personality_type']]
Q26["personality_type"].replace({"ISTJ": 1, "ISTP": 2, "ISTP": 2, "ISFJ": 3, "ISFP": 4, "INFJ": 5, "INFP": 6,
                                "INTJ": 7, "INTP": 8, "ESTP": 9, "ESTJ": 10, "ESFP": 11, "ESFJ": 12, "ENFP": 13,
                                "ENFJ": 14, "ENTP": 15, "ENTJ": 16}, inplace=True)
Q26['personality_type'] = Q26['personality_type'].fillna(17)
Q26['personality_type'] = Q26['personality_type'].astype('int')
print(Q26)
     castaway_id          full_name  personality_type
0              1  Sonja Christopher                13
1              2      B.B. Anderson                10
2              3    Stacey Stillman                16
3              4        Ramona Gray                 1
4              5          Dirk Been                 4
..           ...                ...               ...
603          604      Tiffany Seely                15
604          605       Sydney Segal                 9
605          606      Shantel Smith                14
606          607         David Voce                16
607          608      Liana Wallace                10

[608 rows x 3 columns]

Q27: After thinking about this some more, you realize that you don't want to code the personality traits as you did in problem 26 since the data is not ordinal. Some machine learning algorithms will assume that numbers close to each other are more alike than those that are away from each other and that is not the case with these personality types.

Let's create a new DataFrame called Q27 that creates dummy columns (using get_dummies) for the original personality type column. Add a prefix called "type" and drop the first column to help prevent multicollinearity. The columns should be castaway_id, full_name followed by the various dummy columns for the personality types. Don't worry about any missing values in this step.

Remember: Don't change any of the original DataFrames or CodeGrade will not work correctly for this assignment. Make sure you use copy() if needed.

In [53]:
Q27 = castaway_details.copy()
Q27 = Q27[['castaway_id','full_name','personality_type']]
Q27 = pd.get_dummies(data=Q27,columns=['personality_type'], prefix='type',drop_first = True)
print(Q27)
     castaway_id          full_name  type_ENFP  type_ENTJ  type_ENTP  \
0              1  Sonja Christopher          1          0          0   
1              2      B.B. Anderson          0          0          0   
2              3    Stacey Stillman          0          1          0   
3              4        Ramona Gray          0          0          0   
4              5          Dirk Been          0          0          0   
..           ...                ...        ...        ...        ...   
603          604      Tiffany Seely          0          0          1   
604          605       Sydney Segal          0          0          0   
605          606      Shantel Smith          0          0          0   
606          607         David Voce          0          1          0   
607          608      Liana Wallace          0          0          0   

     type_ESFJ  type_ESFP  type_ESTJ  type_ESTP  type_INFJ  type_INFP  \
0            0          0          0          0          0          0   
1            0          0          1          0          0          0   
2            0          0          0          0          0          0   
3            0          0          0          0          0          0   
4            0          0          0          0          0          0   
..         ...        ...        ...        ...        ...        ...   
603          0          0          0          0          0          0   
604          0          0          0          1          0          0   
605          0          0          0          0          0          0   
606          0          0          0          0          0          0   
607          0          0          1          0          0          0   

     type_INTJ  type_INTP  type_ISFJ  type_ISFP  type_ISTJ  type_ISTP  
0            0          0          0          0          0          0  
1            0          0          0          0          0          0  
2            0          0          0          0          0          0  
3            0          0          0          0          1          0  
4            0          0          0          1          0          0  
..         ...        ...        ...        ...        ...        ...  
603          0          0          0          0          0          0  
604          0          0          0          0          0          0  
605          0          0          0          0          0          0  
606          0          0          0          0          0          0  
607          0          0          0          0          0          0  

[608 rows x 17 columns]

Q28: After running your data above through your machine learning model, you determine that a better prediction might come from breaking the personality type into its four parts (one part for each character in the type). Your task is now to create a DataFrame called Q28 that splits the personality type into the various parts and creates a new column for each part (these columns should be called interaction that will represent the first letter in the personality type, information for the second letter, decision for the third, and organization for the fourth).

Again, since most machine learning algorithms work with numeric data, perform the following on the four new columns:

  • interaction --> code all I's as 0 and E's as 1
  • information --> code all S's as 0 and N's as 1
  • decision --> code all T's as 0 and F's as 1
  • organization --> code as J's with 0 and P's as 1
  • Any missing values should be coded with a 2

For example, if a contestant's personality type was ENTJ, your columns for that row would be:

  • 1 for interaction because of the E
  • 1 for information because of the N
  • 0 for decision because of the T
  • 0 for organization because of the J

The new DataFrame should be sorted in castaway_id order and have the following columns in this order: castaway_id, full_name, personality_type, interaction, information, decision, organization.

Remember: Don't change any of the original DataFrames or CodeGrade will not work correctly for this assignment. Make sure you use copy() if needed.

In [54]:
Q28 = castaway_details.copy()
Q28 = Q28[['castaway_id','full_name','personality_type']]
Q28[['0','interaction', 'information', 'decision', 'organization','1']] = Q28['personality_type'].str.split('', expand=True)
Q28 = Q28.drop(['0','1'], axis = 1)
Q28["interaction"].replace({"E": '1', "I": '0'}, inplace=True)
Q28["information"].replace({"N": '1', "S": '0'}, inplace=True)
Q28["decision"].replace({"F": '1', "T": '0'}, inplace=True)
Q28["organization"].replace({"P": '1', "J": '0'}, inplace=True)
print(Q28)
     castaway_id          full_name personality_type interaction information  \
0              1  Sonja Christopher             ENFP           1           1   
1              2      B.B. Anderson             ESTJ           1           0   
2              3    Stacey Stillman             ENTJ           1           1   
3              4        Ramona Gray             ISTJ           0           0   
4              5          Dirk Been             ISFP           0           0   
..           ...                ...              ...         ...         ...   
603          604      Tiffany Seely             ENTP           1           1   
604          605       Sydney Segal             ESTP           1           0   
605          606      Shantel Smith             ENFJ           1           1   
606          607         David Voce             ENTJ           1           1   
607          608      Liana Wallace             ESTJ           1           0   

    decision organization  
0          1            1  
1          0            0  
2          0            0  
3          0            0  
4          1            1  
..       ...          ...  
603        0            1  
604        0            1  
605        1            0  
606        0            0  
607        0            0  

[608 rows x 7 columns]

Q29: Using data from castaways, create a DataFrame called Q29 that bins the contestant ages (their age when they were on the season, not their current age) into the following age categories:

  • 18-24
  • 25-34
  • 35-44
  • 45-54
  • 55-64
  • 65+

The final DataFrame should have the following columns in this order: season, castaway_id, full_name, age, and age_category. The DataFrame should be sorted by age and then castaway_id. The index should be 0 through n-1. You should have the same amount of rows as in the castaways DataFrame.

Remember: Don't change any of the original DataFrames or CodeGrade will not work correctly for this assignment. Make sure you use copy() if needed.

In [55]:
bins = [17,24,34,44,54,64,100]
group_names = ["18-24",'25-34','35-44','45-54','55-64','65+']
Q29 = castaways.copy()
Q29['age_category'] = pd.cut(Q29['age'], bins, labels = group_names)
Q29 = Q29[['season','castaway_id','full_name','age','age_category']].sort_values(by = ['age','castaway_id']).reset_index(drop = True)
print(Q29)
     season  castaway_id          full_name  age age_category
0        33          491          Will Wahl   18        18-24
1        36          528     Michael Yerger   18        18-24
2        18          270       Spencer Duhm   19        18-24
3        22          336  Natalie Tenerelli   19        18-24
4        23          350      Brandon Hantz   19        18-24
..      ...          ...                ...  ...          ...
757      24          366         Greg Smith   64        55-64
758      21          304      Jimmy Johnson   67          65+
759      32          474   Joseph Del Campo   71          65+
760       1           14        Rudy Boesch   72          65+
761       8           14        Rudy Boesch   75          65+

[762 rows x 5 columns]

Q30: Based on the age categories you created above, what are the normalized percentages for the various age categories using value_counts(). Sort the value counts by index. Save this as Q30.

In [56]:
Q30= Q29['age_category'].value_counts(normalize=True).sort_index()
print(Q30)
18-24    0.190289
25-34    0.437008
35-44    0.211286
45-54    0.124672
55-64    0.031496
65+      0.005249
Name: age_category, dtype: float64

Q31: Which contestant(s) played a perfect game? A perfect game is considered when the contestant:

  • didn't receive any tribal council votes all season
  • won the game
  • got unanimous jury votes (see question 22)

Save this DataFrame as Q31 with the following columns: season_name, season, castaway_id, full_name, tribal_council_votes, jury_votes. The DataFrame should be sorted by season and the index should be 0 to n-1.

In [57]:
unanimous_season = [14,18,26,31,33]
perfect_game = combined4[(combined4['total_votes_received'] == 0)]
Q31 = perfect_game[perfect_game['season'].isin(unanimous_season)]
Q31 = Q31.merge(season_summary, how='inner', left_on= 'season', right_on='season')
Q31 = Q31[['season_name_x','season','castaway_id','full_name_x','total_votes_received','final_vote']]
Q31 = Q31.rename(columns={'season_name_x':'season_name','full_name_x':'full_name','total_votes_received':'tribal_council_votes','final_vote':'jury_votes'})
Q31['jury_votes'] = Q31['jury_votes'].str.split('-').str[0]
print(Q31)
           season_name  season  castaway_id         full_name  \
0  Survivor: Tocantins      18          281  James Thomas Jr.   
1   Survivor: Caramoan      26          348      John Cochran   

   tribal_council_votes jury_votes  
0                     0          7  
1                     0          8  
In [ ]: